Search CORE

641 research outputs found

Statistical clustering of temporal networks through a dynamic stochastic block model

Author: Matias Catherine
Miele Vincent
Publication venue
Publication date: 22/06/2016
Field of study

Statistical node clustering in discrete time dynamic networks is an emerging field that raises many challenges. Here, we explore statistical properties and frequentist inference in a model that combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time. We model binary data as well as weighted dynamic random graphs (with discrete or continuous edges values). Our approach, motivated by the importance of controlling for label switching issues across the different time steps, focuses on detecting groups characterized by a stable within group connectivity behavior. We study identifiability of the model parameters, propose an inference procedure based on a variational expectation maximization algorithm as well as a model selection criterion to select for the number of groups. We carefully discuss our initialization strategy which plays an important role in the method and compare our procedure with existing ones on synthetic datasets. We also illustrate our approach on dynamic contact networks, one of encounters among high school students and two others on animal interactions. An implementation of the method is available as a R package called dynsbm

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Convergence of the groups posterior distribution in latent or stochastic block models

Author: Mariadassou Mahendra
Matias Catherine
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 01/01/2014
Field of study

We propose a unified framework for studying both latent and stochastic block models, which are used to cluster simultaneously rows and columns of a data matrix. In this new framework, we study the behaviour of the groups posterior distribution, given the data. We characterize whether it is possible to asymptotically recover the actual groups on the rows and columns of the matrix, relying on a consistent estimate of the parameter. In other words, we establish sufficient conditions for the groups posterior distribution to converge (as the size of the data increases) to a Dirac mass located at the actual (random) groups configuration. In particular, we highlight some cases where the model assumes symmetries in the matrix of connection probabilities that prevents recovering the original groups. We also discuss the validity of these results when the proportion of non-null entries in the data matrix converges to zero.Comment: Published at http://dx.doi.org/10.3150/13-BEJ579 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

Modeling heterogeneity in random graphs through latent space models: a selective review

Author: Matias Catherine
Robin Stéphane
Publication venue
Publication date: 25/09/2014
Field of study

We present a selective review on probabilistic modeling of heterogeneity in random graphs. We focus on latent space models and more particularly on stochastic block models and their extensions that have undergone major developments in the last five years

arXiv.org e-Print Archive

HAL Evry

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

HAL Descartes

ProdInra

Hal-Diderot

On efficient estimators of the proportion of true null hypotheses in a multiple testing setup

Author: Matias Catherine
Nguyen Van Hanh
Publication venue
Publication date: 08/01/2013
Field of study

We consider the problem of estimating the proportion

\theta

of true null hypotheses in a multiple testing context. The setup is classically modeled through a semiparametric mixture with two components: a uniform distribution on interval

[0,1]

with prior probability

\theta

and a nonparametric density

f

. We discuss asymptotic efficiency results and establish that two different cases occur whether

f

vanishes on a set with non null Lebesgue measure or not. In the first case, we exhibit estimators converging at parametric rate, compute the optimal asymptotic variance and conjecture that no estimator is asymptotically efficient (i.e. attains the optimal asymptotic variance). In the second case, we prove that the quadratic risk of any estimator does not converge at parametric rate. We illustrate those results on simulated data

arXiv.org e-Print Archive

HAL Evry

Hal-Diderot

Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation

Author: Matias Catherine
Nguyen Van Hanh
Publication venue
Publication date: 03/04/2013
Field of study

In a multiple testing context, we consider a semiparametric mixture model with two components where one component is known and corresponds to the distribution of

p

-values under the null hypothesis and the other component

f

is nonparametric and stands for the distribution under the alternative hypothesis. Motivated by the issue of local false discovery rate estimation, we focus here on the estimation of the nonparametric unknown component

f

in the mixture, relying on a preliminary estimator of the unknown proportion

\theta

of true null hypotheses. We propose and study the asymptotic properties of two different estimators for this unknown component. The first estimator is a randomly weighted kernel estimator. We establish an upper bound for its pointwise quadratic risk, exhibiting the classical nonparametric rate of convergence over a class of H\"older densities. To our knowledge, this is the first result establishing convergence as well as corresponding rate for the estimation of the unknown component in this nonparametric mixture. The second estimator is a maximum smoothed likelihood estimator. It is computed through an iterative algorithm, for which we establish a descent property. In addition, these estimators are used in a multiple testing procedure in order to estimate the local false discovery rate. Their respective performances are then compared on synthetic data

arXiv.org e-Print Archive

HAL Evry

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Numérisation de Documents Anciens Mathématiques

Hal-Diderot

Asymptotic normality and efficiency of the maximum likelihood estimator for the parameter of a ballistic random walk in a random environment

Author: Falconnet Mikael
Loukianova Dasha
Matias Catherine
Publication venue: 'Allerton Press'
Publication date: 14/11/2013
Field of study

We consider a one dimensional ballistic random walk evolving in a parametric independent and identically distributed random environment. We study the asymptotic properties of the maximum likelihood estimator of the parameter based on a single observation of the path till the time it reaches a distant site. We prove an asymptotic normality result for this consistent estimator as the distant site tends to infinity and establish that it achieves the Cram\'er-Rao bound. We also explore in a simulation setting the numerical behaviour of asymptotic confidence regions for the parameter value

arXiv.org e-Print Archive

HAL Evry

HAL Descartes

Adaptive procedures in convolution models with known or partially known noise distribution

Author: Butucea Cristina
Matias Catherine
Pouet Christophe
Publication venue
Publication date: 22/03/2007
Field of study

In a convolution model, we observe random variables whose distribution is the convolution of some unknown density f and some known or partially known noise density g. In this paper, we focus on statistical procedures, which are adaptive with respect to the smoothness parameter tau of unknown density f, and also (in some cases) to some unknown parameter of the noise density g. In a first part, we assume that g is known and polynomially smooth. We provide goodness-of-fit procedures for the test H_0:f=f_0, where the alternative H_1 is expressed with respect to L_2-norm. Our adaptive (w.r.t tau) procedure behaves differently according to whether f_0 is polynomially or exponentially smooth. A payment for adaptation is noted in both cases and for computing this, we provide a non-uniform Berry-Esseen type theorem for degenerate U-statistics. In the first case we prove that the payment for adaptation is optimal (thus unavoidable). In a second part, we study a wider framework: a semiparametric model, where g is exponentially smooth and stable, and its self-similarity index s is unknown. In order to ensure identifiability, we restrict our attention to polynomially smooth, Sobolev-type densities f. In this context, we provide a consistent estimation procedure for s. This estimator is then plugged-into three different procedures: estimation of the unknown density f, of the functional \int f^2 and test of the hypothesis H_0. These procedures are adaptive with respect to both s and tau and attain the rates which are known optimal for known values of s and tau. As a by-product, when the noise is known and exponentially smooth our testing procedure is adaptive for testing Sobolev-type densities.Comment: 35 pages + annexe de 8 page

arXiv.org e-Print Archive

A semiparametric extension of the stochastic block model for longitudinal networks

Author: Matias Catherine
Rebafka Tabea
Villers Fanny
Publication venue
Publication date: 25/07/2017
Field of study

To model recurrent interaction events in continuous time, an extension of the stochastic block model is proposed where every individual belongs to a latent group and interactions between two individuals follow a conditional inhomogeneous Poisson process with intensity driven by the individuals' latent groups. The model is shown to be identifiable and its estimation is based on a semiparametric variational expectation-maximization algorithm. Two versions of the method are developed, using either a nonparametric histogram approach (with an adaptive choice of the partition size) or kernel intensity estimators. The number of latent groups can be selected by an integrated classification likelihood criterion. Finally, we demonstrate the performance of our procedure on synthetic experiments, analyse two datasets to illustrate the utility of our approach and comment on competing methods

arXiv.org e-Print Archive

Hal-Diderot